ValueError: Variable lstm_cell/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel already exists.

写在前面

最近在学习使用tensorflow构建language model,遇到关于模型重用的问题,我将模型的训练和预测放在同一个文件中时出现的问题,提示lstm_cell kernal已经存在.

错误提示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
48 Traceback (most recent call last): 
49 File "anna_writer.py", line 274, in <module>
50 samp = generate_samples(checkpoint, 20000, prime="The ")
51 File "anna_writer.py", line 234, in generate_samples
52 conf.lstm_size, conf.keep_prob, conf.grad_clip, False)
53 File "anna_writer.py", line 74, in __init__
54 self.add_lstm_cell()
55 File "anna_writer.py", line 110, in add_lstm_cell
56 initial_state=self.initial_state)
57 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 574, ii n dynamic_rnn
58 dtype=dtype)
59 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 737, ii n _dynamic_rnn_loop
60 swap_memory=swap_memory)
61 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py"" , line 2770, in while_loop
62 result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
63 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py"" , line 2599, in BuildLoop
64 pred, body, original_loop_vars, loop_vars, shape_invariants)
65 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py"" , line 2549, in _BuildLoop
66 body_result = body(*packed_vars_for_body)
67 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 722, ii n _time_step
68 (output, new_state) = call_cell()
69 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 708, ii n <lambda>
70 call_cell = lambda: cell(input_t, state)
71 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", ll ine 180, in __call__
72 return super(RNNCell, self).__call__(inputs, state)
73 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 444 1, in __call__
74 outputs = self.call(inputs, *args, **kwargs)
75 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", l ine 916, in call
76 cur_inp, new_state = cell(cur_inp, cur_state)
77 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", l ine 752, in __call__
78 output, new_state = self._cell(inputs, state, scope)
79 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", l ine 180, in __call__
80 return super(RNNCell, self).__call__(inputs, state)
81 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 44 1, in __call__
82 outputs = self.call(inputs, *args, **kwargs)
83 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", l ine 383, in call
84 concat = _linear([inputs, h], 4 * self._num_units, True)
85 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", l ine 1017, in _linear
86 initializer=kernel_initializer)
87 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
88 use_resource=use_resource, custom_getter=custom_getter)
89 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
90 use_resource=use_resource, custom_getter=custom_getter)
91 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 360, in get_variable
92 validate_shape=validate_shape, use_resource=use_resource)
93 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1405, in wrapped_custom_getter
94 *args, **kwargs)
95 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", l ine 183, in _rnn_get_variable
96 variable = getter(*args, **kwargs)
97 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", l ine 183, in _rnn_get_variable
98 variable = getter(*args, **kwargs)
99 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
100 use_resource=use_resource)
101 File "/home/tf1.0/local/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 664, in _get_single_variable
102 name, "".join(traceback.format_list(tb))))
103 ValueError: Variable lstm_cell/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
104 105 File "anna_writer.py", line 110, in add_lstm_cell
106 initial_state=self.initial_state)
107 File "anna_writer.py", line 74, in __init__
108 self.add_lstm_cell()
109 File "anna_writer.py", line 172, in train 110 conf.grad_clip, is_training=True)

解决方法

这个问题困扰了我两天,始终找不到解决方案,当我的训练模型和预测模型分开运行时程序没有报错,但是两个程序放在一起运行时就会出现问题,网上搜索的结果大都是关于共享权重的问题,错误提示是:

1
ValueError: Variable hello/rnn/basic_lstm_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?

这个error跟我的错误还是有一定的区别的。这些问题主要原因在于使用多层lstm_cell或者双向lstm的时候忽略了定义变量的variable_scope,导致lstm_cell的作用域不一样,但是程序加载的时候并不知道,所以当声明的cell不是同一个的时候,需要用

1
with tf.variable_scope(name):

来定义不同的作用范围就可以了,具体还要根据实际情况。

而我的问题好像网上还没有这样的解释,我仔细看错误的提示,分析我的代码,当train和predict放在一起的时候,会调用两次class language_model:这时候就会出现系统里应该存在两个不同的lstm_cell模型,但是系统无法辨别出来,所以会提示kernel already exists,而不是weights already exists

而tensorflow有一个reset_default_graph()函数,我对python多线程不是很清楚,贴下源码,

1
2
3
4
5
6
7
8
9
def reset_default_graph():
"""Clears the default graph stack and resets the global default graph.

NOTE: The default graph is a property of the current thread. This
function applies only to the current thread. Calling this function while
a `tf.Session` or `tf.InteractiveSession` is active will result in undefined
behavior. Using any previously created `tf.Operation` or `tf.Tensor` objects
after calling this function will result in undefined behavior.
"""

然后在我定义的language_model类中添加这个函数之后之前的问题就解决了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import tensorflow as tf
class language_model:
def __init__(self, num_classes, batch_size=100, seq_length=50, learning_rate=0.01, num_layers=5, hidden_units=128,
keep_prob=0.8, grad_clip=5, is_training=True):
# 模型的训练和预测放在同一个文件下时如果没有这个函数会报错。
tf.reset_default_graph()
self.learning_rate = learning_rate
self.num_layers = num_layers
self.hidden_units = hidden_units
self.is_training = is_training
self.keep_prob = keep_prob
self.grad_clip = grad_clip
self.num_classes = num_classes

if self.is_training:
self.batch_size = batch_size
self.seq_length = seq_length
else:
self.batch_size = 1
self.seq_length = 1

with tf.name_scope('add_input_layer'):
self.add_input_layer()
with tf.variable_scope('lstm_cell'):
self.add_multi_cells()
with tf.name_scope('build_output'):
self.build_output()
with tf.name_scope('cost'):
self.compute_cost()
with tf.name_scope('train_op'):
self.train_op()

题外话 :tensorflow1.2版本之后,定义多层lstm(MultiRNNCell)与原来的版本改变比较大,可以看考PTB tutorials—Stacking multiple LSTMs.

文中涉及到的代码见:github–anna

参考文献

1、tensorflow1.x版本rnn生成cell 报错解决方案

2、ValueError: Attempt to reuse RNNCell

3、How to reuse weights in MultiRNNCell?

4、ValueError: Attempt to reuse RNNCell with a different variable scope than its first use.